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Hypothetical Validation Campaign Ene Me 
= 10,000M mile simulation campaign 


e Goal: under 1 fatality/billion miles 10,000M 
e Claim ~5-10x better than human 
= 100M mile collected data/scenarios 
e Claim simulating this is representative 
= 10M road testing of final software 
e Claim this validates simulation 








Data 
Collection 


= Is this statistically valid? 
e Questionable confidence in collected data tig og 
e Road testing useful, but insufficient on its own 
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How Much Do You Trust Validation? = ios 


= Would you put a child in front of this self driving car? 
e 10,000M mile sims 
.. perhaps with a simulator error? 


100M miles data collected 
.. perhaps with scenario analysis errors? 


e 10M of road testing 
.. that missed the above errors? 


10K repetitions of closed course testing ~y 
... with standard dummies instead of people (@ 


With biased perception training data? 


e Built from software binaries & tools 
.. With no safety qualification? © 2021 Philip Koopman 77 
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Engineering Rigor Dae ay 

= Testing alone is insufficient for life-critical systems 
e So we use also use engineering rigor 


= Can you trust the system itself? 
e Is it engineered for safety? 
e Were standards and best practices used? 
e Is there a safety case documenting all this? 





= Can you trust your validation process? 
e Did you engineer the simulations properly? 
e Did you design the validation campaign properly? 
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Field Engineering Feedback ek! 
= Expected risk has a mean + uncertainty 
e You should deploy only when mean is acceptable 
\\\ ii iy, 
\ / 
aL 2 
SPI 








e But there will be uncertainty 
— Missed edge cases during road testing 
— Unknown gaps in validation plan 
— Unknown unknowns in general 
= Solution: continuous field monitoring 
e Monitor Safety Performance Indicators (SPIs) 
— SPI violation means safety argument has a defect 
— Investigate and fix root causes before loss events 
e Start during validation; continue after deployment 


ly 
Ic, 
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cx Safety Culture ea, 
= Did you do what you said you did? 

e Did your validation skip over known problems? 
e Did your engineering team skip process steps? 
e Is your field monitoring ignoring SPI violations? 





= Good safety culture mitigates risk 
e Having a Safety Management System is a start 
e Safety culture involves everyone in the lifecycle 








https://bit.ly/3i5wl57 


= Safety culture simplified: 
e Are you incentivized to do the right thing? 
e Is it OK to tell your boss bad news? Will your boss fix it? 
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Investigation finds Uber's ‘ineffective safety 
culture’ to blame for its self-driving car killing a 


pedestrian last year https://bit.ly/3epKmdy 


3 David Shepardson, Reuters Nov 20, 2019, 8:48 AM > 








- 
——— 


National Transportation Safety Board (NTSB) investigators examine a self-driving Uber vehicle involved in a fatal accident in Tempe, Arizona, U.S., 
March 20, 2018. National Transportation Safety Board/Handout via REUTERS 
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Positive Trust Balance ET pect ty 


= Positive Trust Balance: 
e Stakeholders trust that lifecycle risk will be acceptable 


TRUSTWORTHY POSITIVE RISK BALANCE 
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Engineering Validation Feedback Safety 
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Building Trust with 
Stakeholders 


“> Safety transparency 





“» Beyond testing to Positive Trust Balance: 
Engineering, Validation, Feedback, Culture 


“* Robust safety culture required to succeed 
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